Lecture � MIT MAS962 Computational semantics

Greg Detre

@10 on Tuesday, October 01, 2002

Federica Busa present

assignment number 2

Brian, WordNet presentation

started 1985, $3m funding (apparently mainly from CIA etc.), small group of lexicographers

100,000+ entries

of the 75,000 synsets, there are c. 10,000 meronyms

antonymy is a lexical, rather than semantic relation, e.g. rise/descend aren�t antonyms

antonyms:

21% of adjectives, 9% of verbs, 2.4% of nouns

these are very small, given that people can enumerate loads of antonyms for more or less anything (lots that we can think of don�t exist, e.g. cat/dog, bachelor)

ones that do exist, e.g. success/failure, best/worst

you could always define non-X as the antonym of every noun J (Deb)

store adjectives more or less as antonyms

hyponymy/hypernymy

is-a relation as Wordnet killer-app

does it distinguish sub-class-of vs instance-of?

e.g. a president is a sub-class of person, while Bill Clinton is an instance-of president)

CYC does

there are so many different types of is-a

the same goes for has-a

what about multiple-inheritance (from the top level domains)???

book as artifact vs knowledge, or animal vs meat

�fish� as food is a different synset from �fish� as animal � Deb wants to know whether you do ever want to cross synsets without having to go via the level above

Tom thinks that a lot of the controversial decisions (e.g. the 25-part top-level grouping) are really just implementation decisions, e.g. to make it easier to split it up into files for separate lexicographers

however, they are making a commitment to the idea of synsets, i.e. the idea that words are swappable to some degree � what happens when you meet both units simultaneously?

is the database generated by graduate students, or from corpora/tagged texts?

Federica said that apparently they weren�t intending this for NLP tasks, and weren�t prepared for the criticism from NLP researchers of how much ambiguity there is

they haven�t added noun � verb relations yet, because it�s problematic

Adjectives

bipolar model � most adjectives are defined by/in terms of their antonyms

gradation

e.g. saintly, good, worthy, ordinary, unworthy, evil

more than one-dimensional?

e.g. colour, taste, emotions, personality

happiness-sadness vs intensity � difficult to collapse down to one dimension

we may be off by a couple of orders of magnitude in terms of the dimensionality of some adjectives (e.g. taste)

Verb classes

top level, e.g. weather verbs

lexical entailment, e.g. sleeping/snoring

troponyms (invented word) � particular ways to

Two major gripes:

too many disconnected synsets, not clustered enough

useless glosses

untagged, unrelated to each other

X/Wordnet 2 or something trying to solve this problem???

taxonomic issues

a lot of people use it, but is it good for anything?

often used in query-expansion (e.g. in information-retrieval), sense disambiguation, sense-tagging (given a �context�, find best-fitting subtree), semantic distance, topic clustering

perhaps it�ll be really good once you�ve already boot-strapped to a certain point

by that point, it might just be able to read a dictionary � but WN does have extra, more machine-readable information

Federica, Pustejovsky, �The Generative Lexicon�

combining work in AI, phil of language, generative syntax

Tom thinks it seems to be making more claims than the Wordnet people

does it rest on anything empirically???

depends on what you take to be empirical data

problems with consistency?

Generative Lexicon went through two stages:

earlier years, straight theoretical, almost Chomskyan

later, actually started to deploy the Generative Lexicon system

the paper is kind of in between those

applications of GL:

Simple

NLP

Euro Wordnet uses the GL top ontology

Intellectual foundations

the idea of qualia structure

Julius Moravcsik (Aristotle�s 4) � meaning of a word:

constituency

generic domain of application

functional element in meaning

causal origin

Deb thinks there�s a fifth: a theory of how it works (is this his own idea???)

e.g.:

person: theory of mind

aeroplane: na� theory of aerodynamics

they don�t think this is reflected in linguistic data, e.g. in how noun-noun compounds work, or how you account for long-distance dependencies in the syntax (???)

Data

adjectives sometimes can�t be used in syntactically identical expressions in composition with certain nouns

e.g.

�a good rock� is ok if uttered by a climber

#�a good cloud�

an old swimer

person who is a swimmer and is old

a person who has eben swimming for al ong time

an old fish

a fish who is old

* a fish that has been swimming for a long time

an old story

a story that was written a long time ago

* a story I have been reading for a long time

they want to base the interpretation on the adjective/verb

forces you to enumerate all of the different varieties of �good�

what kind of granularity do you need to have? how do you know to rule out the ones that are funny (e.g. a good cloud � what about pilots, picnics, cloud-centric cultures???)?

surely you can�t enumerate all the different permissible contexts etc.???

I think that �good� is definitely anchored to function � is this controversial???

no, this is what they go on to argue � �good� is related to �what I do with it�

in terms of �old�, it�s more related to �how the entity came about�

Basic questions

Can you study the structure of concepts in the same way as you study syntax?

Are there generalisations that can be made about what drives lexical inference?

Can you separate analytical knowledge from contextually determined interpretations?

Can you set up an empirical basis for motivating lexical representations?

Can you capture abstractions without enumeration?

they look at data in a linguistic empirical way (I think???)

Tom thinks that you could build an intelligent system without differentiating between nouns and verbs (for example)

he thinks they�re making assumptions about meaning/language

assumption that we can get at stuff/understanding (of what???) by studying language in general

Towards a structure for concepts

concepts have different degrees of complexity/richness

and so different numbers of inferences

this applies across (PoS) categories

whereas Wordnet uses different strategies for different categories

but I would argue that this is because they assume that we (brains) use different strategies/representations for different categories

the same concept can be lexicalised in one or more ways (i.e./e.g. as a noun or verb or both)

children�s data

easier mapping between parts of speech and primitives � the linguistic data is one lens onto the concepts � perhaps you can see why certain primitive parts of speech emerge early

Tom argues that looking at the language alone won�t tell you enough � it�s a surface representation of something deeper � not make structure of language the end-goal if you�re interested in the conceptual structure

language is not the only lens on conceptual structure

how much of Wordnet could you discover just from the linguistic data?

language impairment

patient being completely incapable of talking about containers

e.g. �the frog is in the ��

just how generative is the GL?

generate your entire ontology from combinations of qualia

Pustejovsky � don�t enumerate word senses because there�s too many ways in which two words can combine

Questions

lexical database vs knowledge base???

encoded database term-property relations vs semantic relations (smartness)

put presentations online???

IR??? information-retrieval

why is it called �Generative Lexicon�??? what does it mean???

when they say they�re thinking about its uses for AI, where exactly would it fit in/what would it do???

what is the difference between lexical and semantics??? what do you mean by the �lexicon�???

media lab vs AI lab???